Rule-based Coreference Resolution in German Historic Novels

نویسندگان

  • Markus Krug
  • Frank Puppe
  • Fotis Jannidis
  • Luisa Macharowsky
  • Isabella Reger
  • Lukas Weimar
چکیده

Coreference resolution (CR) is a key task in the automated analysis of characters in stories. Standard CR systems usually trained on newspaper texts have difficulties with literary texts, even with novels; a comparison with newspaper texts showed that average sentence length is greater in novels and the number of pronouns, as well as the percentage of direct speech is higher. We report promising evaluation results for a rule-based system similar to [Lee et al. 2011], but tailored to the domain which recognizes coreference chains in novels much better than CR systems like CorZu. Rule-based systems performed best on the CoNLL 2011 challenge [Pradhan et al. 2011]. Recent work in machine learning showed similar results as rule-based systems [Durett et al. 2013]. The latter has the advantage that its explanation component facilitates a fine grained error analysis for incremental refinement of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Corefrence resolution with deep learning in the Persian Labnguage

Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...

متن کامل

Real Anaphora Resolution is Hard .

We introduce a system for anaphora resolution for German that uses various resources in order to develop a real system as opposed to systems based on idealized assumptions, e.g. the use of true mentions only or perfect parse trees and perfect morphology. The components that we use to replace such idealizations comprise a full-fledged morphology, a Wikipedia-based named entity recognition, a rul...

متن کامل

Extending BART to Provide a Coreference Resolution System for German

We present a flexible toolkit-based approach to automatic coreference resolution on German text. We start with our previous work aimed at reimplementing the system from Soon et al. (2001) for English, and extend it to duplicate a version of the state-of-the-art proposal from Klenner and Ailloud (2009). Evaluation performed on a benchmarking dataset, namely the TüBa-D/Z corpus (Hinrichs et al., ...

متن کامل

CORBON 2017 Shared Task: Projection-Based Coreference Resolution

The CORBON 2017 Shared Task, organised as part of the Coreference Resolution Beyond OntoNotes workshop at EACL 2017, presented a new challenge for multilingual coreference resolution: we offer a projection-based setting in which one is supposed to build a coreference resolver for a new language exploiting little or even no knowledge of it, with our languages of interest being German and Russian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015